week 7: multilevel models

multilevel adventures

slopes

Let’s start by simulating the cafe data.

# ---- set population-level parameters -----
a <- 3.5       # average morning wait time
b <- (-1)      # average difference afternoon wait time
sigma_a <- 1   # std dev in intercepts
sigma_b <- 0.5 # std dev in slopes
rho <- (-0.7)  #correlation between intercepts and slopes

# ---- create vector of means ----
Mu <- c(a, b)

# ---- create matrix of variances and covariances ----
sigmas <- c(sigma_a,sigma_b) # standard deviations
Rho <- matrix( c(1,rho,rho,1) , nrow=2 ) # correlation matrix
# now matrix multiply to get covariance matrix
Sigma <- diag(sigmas) %*% Rho %*% diag(sigmas)

# ---- simulate intercepts and slopes -----
N_cafes = 20
library(MASS)
set.seed(5)
vary_effects <- mvrnorm( n=N_cafes, mu = Mu, Sigma=Sigma)
a_cafe <- vary_effects[, 1]
b_cafe <- vary_effects[, 2]

# ---- simulate observations -----

set.seed(22)
N_visits <- 10
afternoon <- rep(0:1,N_visits*N_cafes/2)
cafe_id <- rep( 1:N_cafes , each=N_visits )
mu <- a_cafe[cafe_id] + b_cafe[cafe_id]*afternoon
sigma <- 0.5 # std dev within cafes
wait <- rnorm( N_visits*N_cafes , mu , sigma )
d <- data.frame( cafe=cafe_id , afternoon=afternoon , wait=wait )

a simulation note from RM

In this exercise, we are simulating data from a generative process and then analyzing that data with a model that reflects exactly the correct structure of that process. But in the real world, we’re never so lucky. Instead we are always forced to analyze data with a model that is MISSPECIFIED: The true data-generating process is different than the model. Simulation can be used however to explore misspecification. Just simulate data from a process and then see how a number of models, none of which match exactly the data-generating process, perform. And always remember that Bayesian inference does not depend upon data-generating assumptions, such as the likelihood, being true. Non-Bayesian approaches may depend upon sampling distributions for their inferences, but this is not the case for a Bayesian model. In a Bayesian model, a likelihood is a prior for the data, and inference about parameters can be surprisingly insensitive to its details.

Mathematical model:

likelihood function and linear model

\[\begin{align*} W_i &\sim \text{Normal}(\mu_i, \sigma) \\ \mu_i &= \alpha_{CAFE[i]} + \beta_{CAFE[i]}A_i \end{align*}\]

varying intercepts and slopes

\[\begin{align*} \begin{bmatrix} \alpha_{CAFE[i]} \\ \beta_{CAFE[i]} \end{bmatrix} &\sim \text{MVNormal}( \begin{bmatrix} \alpha \\ \beta \end{bmatrix}, \mathbf{S}) \\ \mathbf{S} &\sim \begin{pmatrix} \sigma_{\alpha}, & 0 \\ 0, & \sigma_{\beta}\end{pmatrix}\mathbf{R}\begin{pmatrix} \sigma_{\alpha}, & 0 \\ 0, & \sigma_{\beta}\end{pmatrix} \\ \end{align*}\]

priors

\[\begin{align*} \alpha &\sim \text{Normal}(5,2) \\ \beta &\sim \text{Normal}(-1,0.5) \\ \sigma &\sim \text{Exponential}(1) \\ \sigma_{\alpha} &\sim \text{Exponential}(1) \\ \sigma_{\beta} &\sim \text{Exponential}(1) \\ \mathbf{R} &\sim \text{LKJcorr}(2) \end{align*}\]

LKJ correlation prior

Code
# examples
rlkj_1 = rethinking::rlkjcorr(1e4, K=2, eta=1)
rlkj_2 = rethinking::rlkjcorr(1e4, K=2, eta=2)
rlkj_4 = rethinking::rlkjcorr(1e4, K=2, eta=4)
rlkj_6 = rethinking::rlkjcorr(1e4, K=2, eta=6)
data.frame(rlkj_1= rlkj_1[,1,2], 
           rlkj_2= rlkj_2[,1,2], 
           rlkj_4= rlkj_4[,1,2],
           rlkj_6= rlkj_6[,1,2]) %>% 
  ggplot() +
  geom_density(aes(x=rlkj_1, color = "1"), alpha=.3) +
  geom_density(aes(x=rlkj_2, color = "2"), alpha=.3) +
  geom_density(aes(x=rlkj_4, color = "4"), alpha=.3) +
  geom_density(aes(x=rlkj_6, color = "6"), alpha=.3) +
  labs(x="R", color="eta") +
  theme(legend.position = "top")
m1 <- brm(
  data = d,
  family = gaussian,
  wait ~ 1 + afternoon + (1 + afternoon | cafe),
  prior = c(
    prior( normal(5,2),    class=Intercept ), 
    prior( normal(-1, .5), class=b),
    prior( exponential(1), class=sd),
    prior( exponential(1), class=sigma),
    prior( lkj(2),         class=cor)
  ), 
  iter=2000, warmup=1000, chains=4, cores=4, seed=9,
  file=here("files/models/72.1")
)
posterior_summary(m1) %>% round(3)
                               Estimate Est.Error     Q2.5    Q97.5
b_Intercept                       3.663     0.225    3.213    4.107
b_afternoon                      -1.131     0.141   -1.409   -0.848
sd_cafe__Intercept                0.966     0.168    0.707    1.366
sd_cafe__afternoon                0.591     0.123    0.387    0.869
cor_cafe__Intercept__afternoon   -0.506     0.182   -0.799   -0.096
sigma                             0.474     0.028    0.424    0.533
Intercept                         3.098     0.200    2.705    3.497
r_cafe[1,Intercept]               0.552     0.293   -0.009    1.139
r_cafe[2,Intercept]              -1.505     0.301   -2.104   -0.896
r_cafe[3,Intercept]               0.704     0.299    0.118    1.300
r_cafe[4,Intercept]              -0.420     0.292   -0.981    0.148
r_cafe[5,Intercept]              -1.788     0.298   -2.362   -1.201
r_cafe[6,Intercept]               0.597     0.294    0.022    1.178
r_cafe[7,Intercept]              -0.046     0.295   -0.635    0.548
r_cafe[8,Intercept]               0.282     0.299   -0.308    0.861
r_cafe[9,Intercept]               0.316     0.294   -0.281    0.903
r_cafe[10,Intercept]             -0.100     0.294   -0.683    0.490
r_cafe[11,Intercept]             -1.736     0.290   -2.310   -1.173
r_cafe[12,Intercept]              0.178     0.290   -0.398    0.758
r_cafe[13,Intercept]              0.218     0.298   -0.375    0.822
r_cafe[14,Intercept]             -0.488     0.299   -1.084    0.095
r_cafe[15,Intercept]              0.792     0.302    0.217    1.394
r_cafe[16,Intercept]             -0.273     0.294   -0.859    0.316
r_cafe[17,Intercept]              0.554     0.297   -0.026    1.154
r_cafe[18,Intercept]              2.085     0.298    1.515    2.676
r_cafe[19,Intercept]             -0.416     0.296   -1.001    0.156
r_cafe[20,Intercept]              0.066     0.298   -0.526    0.647
r_cafe[1,afternoon]              -0.026     0.289   -0.600    0.535
r_cafe[2,afternoon]               0.227     0.295   -0.360    0.792
r_cafe[3,afternoon]              -0.797     0.297   -1.392   -0.236
r_cafe[4,afternoon]              -0.102     0.282   -0.655    0.430
r_cafe[5,afternoon]               0.995     0.296    0.416    1.572
r_cafe[6,afternoon]              -0.164     0.282   -0.721    0.376
r_cafe[7,afternoon]               0.103     0.281   -0.465    0.642
r_cafe[8,afternoon]              -0.502     0.293   -1.088    0.075
r_cafe[9,afternoon]              -0.175     0.284   -0.731    0.376
r_cafe[10,afternoon]              0.178     0.291   -0.387    0.754
r_cafe[11,afternoon]              0.701     0.285    0.153    1.274
r_cafe[12,afternoon]             -0.055     0.286   -0.633    0.498
r_cafe[13,afternoon]             -0.678     0.292   -1.248   -0.122
r_cafe[14,afternoon]              0.193     0.287   -0.361    0.785
r_cafe[15,afternoon]             -1.061     0.310   -1.649   -0.437
r_cafe[16,afternoon]              0.090     0.283   -0.470    0.638
r_cafe[17,afternoon]             -0.089     0.283   -0.637    0.467
r_cafe[18,afternoon]              0.103     0.302   -0.487    0.703
r_cafe[19,afternoon]              0.871     0.299    0.294    1.471
r_cafe[20,afternoon]              0.075     0.293   -0.492    0.664
lprior                           -5.061     0.436   -6.072   -4.356
lp__                           -197.199     7.158 -212.067 -184.274

Let’s get the slopes and intercepts for each cafe.

Code
intercepts = coef(m1)$cafe[ ,, "Intercept"]
slopes = coef(m1)$cafe[,, "afternoon"]
cafe_params = data.frame(
  cafe=1:20,
  intercepts=intercepts[, 1],
  slopes=slopes[, 1]
) 
cafe_params %>% round(3)
   cafe intercepts slopes
1     1      4.215 -1.156
2     2      2.158 -0.904
3     3      4.367 -1.928
4     4      3.243 -1.232
5     5      1.875 -0.135
6     6      4.260 -1.294
7     7      3.617 -1.027
8     8      3.945 -1.633
9     9      3.979 -1.305
10   10      3.563 -0.953
11   11      1.927 -0.430
12   12      3.841 -1.186
13   13      3.881 -1.809
14   14      3.175 -0.938
15   15      4.455 -2.192
16   16      3.390 -1.040
17   17      4.217 -1.219
18   18      5.748 -1.028
19   19      3.247 -0.260
20   20      3.729 -1.056
Code
cafe_params %>% 
  ggplot( aes(x=intercepts, y=slopes) ) +
  geom_point(size = 2) 
Code
cafe_params %>% 
  ggplot( aes(x=intercepts, y=slopes) ) +
  stat_ellipse() +
  geom_point(size = 2) 
Code
cafe_params %>% 
  ggplot( aes(x=intercepts, y=slopes) ) +
  mapply(function(level) {
    stat_ellipse(geom  = "polygon", type = "norm",
                 linewidth = 0, alpha = .1, fill = "#1c5253",
                 level = level)
    }, 
    # enter the levels here
    level = c(1:9 / 10, .99)) +
  geom_point(size = 2) 

More about stat_ellipse here.

exercise

Now use the slopes and intercepts to calculate the expected morning and afternoon wait times for each cafe. Plot these as a scatterplot. Bonus for ellipses.

Code
cafe_params %>% 
  mutate(
    morning = intercepts, 
    afternoon = intercepts + slopes
  ) %>% 
  ggplot( aes(x=morning, y=afternoon) ) +
  mapply(function(level) {
    stat_ellipse(geom  = "polygon", type = "norm",
                 linewidth = 0, alpha = .1, fill = "#1c5253",
                 level = level)
    }, 
    # enter the levels here
    level = c(1:9 / 10, .99)) +
  geom_point(size = 2)+
  labs(x="morning wait time",
       y="afternoon wait time")

What is the covariance of our intercepts and slopes?

Code
post = as_draws_df(m1)
rlkj_2 = rethinking::rlkjcorr(nrow(post), K=2, eta=2)

data.frame(prior= rlkj_2[,1,2],
           posterior = post$cor_cafe__Intercept__afternoon) %>% 
  ggplot() +
  geom_density(aes(x=prior, color = "prior"), alpha=.3) +
  geom_density(aes(x=posterior, color = "posterior"), alpha=.3) +
  scale_color_manual(values = c("#1c5253" , "#e07a5f")) +
  labs(x="R") +
  theme(legend.position = "top")

predictors at different levels

Returning to our personality data from Tuesday.

data_path = "https://raw.githubusercontent.com/sjweston/uobayes/refs/heads/main/files/data/external_data/mlm.csv"
d <- read.csv(data_path)

d %>% count(wave)
  wave  n
1    1 91
2    2 91
3    3 37
4    4  6
d %>% count(group)
  group   n
1  CTRL  61
2    PD 164
Code
d %>% 
  ggplot(aes(x=wave, y=con, group=id)) +
  geom_line(alpha=.3) +
  facet_wrap(~group)

wave is a within-person variable (Level 1), whereas group is a between-person variable (Level 2).

Mathematical model with level-2 covariate

Likelihood function and linear model

\[\begin{align*} \text{con}_i &\sim \text{Normal}(\mu_i, \sigma) \\ \mu_i &= \alpha_{\text{id}[i]} + \beta_{\text{id}[i]}t_i \end{align*}\]

Varying intercepts and slopes with level-2 predictor:

\[\begin{align*} \alpha_{\text{id}[i]} &= \alpha + \gamma_\alpha G_{\text{id}[i]} + u_{\alpha,\text{id}[i]} \\ \beta_{\text{id}[i]} &= \beta + \gamma_\beta G_{\text{id}[i]} + u_{\beta,\text{id}[i]} \end{align*}\]

Random effects:

\[\begin{align*} \begin{bmatrix} u_{\alpha,\text{id}[i]} \\ u_{\beta,\text{id}[i]} \end{bmatrix} &\sim \text{MVNormal}\left(\begin{bmatrix} 0 \\ 0 \end{bmatrix}, \mathbf{S}\right) \\ \mathbf{S} &= \begin{pmatrix} \sigma_{\alpha} & 0 \\ 0 & \sigma_{\beta}\end{pmatrix}\mathbf{R}\begin{pmatrix} \sigma_{\alpha} & 0 \\ 0 & \sigma_{\beta}\end{pmatrix} \end{align*}\]

Priors: \[\begin{align*} \alpha &\sim \text{Normal}(5,2) \\ \beta &\sim \text{Normal}(-1,0.5) \\ \gamma_\alpha &\sim \text{Normal}(0,1) \\ \gamma_\beta &\sim \text{Normal}(0,0.5) \\ \sigma &\sim \text{Exponential}(1) \\ \sigma_{\alpha} &\sim \text{Exponential}(1) \\ \sigma_{\beta} &\sim \text{Exponential}(1) \\ \mathbf{R} &\sim \text{LKJcorr}(2) \end{align*}\]

m2 <- brm(
  data=d,
  family=gaussian,
  con ~ 1 + time + group + time:group + (1 + time | id),
  prior = c( prior( normal(0,1),    class=Intercept ),
             prior( normal(0,1),    class=b ),
             prior( exponential(1), class=sigma ),
             prior( exponential(1), class=sd ),
             prior( lkj(2),         class=cor)),
  iter=4000, warmup=1000, seed=9, cores=4, chains=4,
  file=here("files/models/72.2")
)